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BIOLOGICAL DATA PRQCKS5 
FIELD OF THE INVENTION 
The present inveaotion relates to automated database searching and in particular to 
automated access to biological databases. 
5 BACKGROUND OF THE INVENTION 

One of the tasks performed in biological research is comparison of newly discovcired 
biological data with data stored in databases. Over two hundred public biological databases are 
available around the world, many on tibe Ihtemet. In general^ databases include a plurality of 
records which have the form of an object-class. The object class^is .fbmied of a plurality of 
10 fields, often in a hierarchy in vMcb. an object class includes one or x0ore sub'^^bject classes 
whicfa-in tum may include sub^sub object classes. The records may represent, for exaim>le, 
gene sequences and may have fields which include various data about the sequences, such as 
their length, origin and a view of the sequence. Information is extr^ed &om databases by 
querying a management system associated witti the database. A sirnple query includes a request 
1^5 to display one or more fields of records which fulfill a certain criteria. 

The existhag databases have different organization methodologies, e.g., different fields 
1^ in each record and diffemt query schemes. In order to access these databases with ease, an 
,p Object Protocol Model (OPM) suite of tools was devdoped. An OPM piocessor mediates 
between a user and databases associated with the OPM suite. A common organization 
• 20 methodolo^r is used to represent tiie data in aUvlhe databases'^acccssed-via the OPM processor. 
Queries addressed to databases via the OPM ptocessorHare -providedr^by a user to tiic-OPM 
processor, in a structured- fbrm^ expressed^^in accordance witii the conmion organization 
methodology. The OPM processor translates the queries fixjm^the-struGtured^OPM form to 
lO query forms compatible^with the management systems of*the specific^databases to which the 
25 queries arc addressed. The results fiom thtf^specific databases are returned to tiic OPM 
processor^ windi«tianslates«thekresultd'-^back^tO'iti^^ orgamzation-methodolog^'of thieKOPM suite. 
Not only does therOPM-^suite aUbw a user to access a pluraHly of difiTerent databases in 
different forms, it also allows fiie user to access a plurality of databases using a single quexy. 
For example, a complex query may request to display the records fix>m a first database which 
30 have a gene length greater than of corresponding records of a second database which represent 
the same organism. 

The use of a common orgamzation saelhodology across databases allows using special 
tools for more easily generating queries and/or perfonmng more complex quoies. For example. 
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a graphic nser inlerface (GUI) of the OPM suite allows the user to prepate a queiy m a 
structured manner. 

Some of the fbrms of biological data are complex data structures, such as gene 
sequences, ^ch require special procedures for mampulation, for example, for performing 
5 comparisons. Homology seaxch engmes, such as BLAST, are used to compaie gene sequences. 
When a user wants to compare, for «anq>l^ all the gene sequences classified in a certain 
month to one or more groups of gene sequences, the user letrievcs all the desired classified 
gene sequences using OPM. Then, the user passes the lettieved data to a homology sequence 
server which performs the sequence comparison. 
1^ SUMMARY OF THE INVENTION 

One object of some prefened embodiments of Ae invention is to provide a method for 
accessing data manipulation servers using a structured query format used to query databases. 
Preferably, the accessing of manipnlatian servers is mtegrated with the accessmg of database 
information, for example by manipulating the results of the data access and/or by ns^ing the 
I J5 results of flie data manipulation as data to be accessed or for lestzictmg qneries. 
p ^« aspect of some preferred embodimmts of to present mvcntion relates to amulli- 

database qu^y system which receives queries which relate to both Hsitr^Knsg and data 
;| manipulation servers, such as homology search engines. The quOTcs relate to the data 
|p manipulation servers as if they are database servers, allowing use of any tool of the multi- 
database query system developed for database queries, on queries which access data 
manipulation serveis. Such tools include, for oxaniple, database linking tools, graphic query 
preparation tools and query optimization tools. By lelating to databases and data manipulation 

lis? 

,g s^'"^^^^*^ a single query, the data manipulation server may pi^ 

.0 as they are provided before the database runs finough all its lecoids. Alternatively or 
additionally, results of a data manq>ulation step may be further queried. Thus, the response 
time required for a complex query may be substantially reduced. Alternatively or additionally, 
the amount of traffic on a network may be reduced and/or better spread out in time. Also, 
complex operations may require less of a user interventiorL 

In some preferred embodmmts of the present invention, tbo input to and/or ou^ut 
fiom of the data manipulatian servers are modeled by structured objects. The modeled input 
objects may result fix>m processing other sections of the query. The modeled output objects 
may be further processed by other sections of the query or even further man^ulated by other 
(or the same) manipulation serves. 
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embodiment of the present inventioB, 



In a prefeMV embodiment of the present inventioB, cRh data maxtipulation server 
associated with the quciy system has a tiaiislation server which mediates between the data 
roanipulalion server and the query system. The translation server receives commands fiiom the 
server in a structured query form used by the query system and translates tibe commands to a 
5 foim in which fte data manipulation server receives conomands. Hie transladon server 
preferably also receives results from the data man^ulation server and presents fhe.zesults to the 
query system in objects organized according to structured object classes used by the query 
system. 

There is thus provided in accordance with a preferred^ embodiment of^the •inventi<>n,-a multi^ 
1 0 database query system which queries^a plurality of databases and servers, comprising: 
an iiipiit wHich^ieodves queries-in a structured-^fiiim;^and 

a translatiGn server which translates at least a part of a received gueiy into commands 
recogruzed by a data manQsulation server. Preferably, the system comprises a processor winch 

,5^ parses Ac received query into parts according to the databases and servers to which they relate. 

(JS Alternatively or additionally, the structured form comprises a form used to query databases. 

•J Alternatively or additionally, the input receives a query which relates to at least one database 

\A and at least one data manipulaticm server. Ahematively or additionally, Ae translation server 
models results fiom the data manipulation server into database objects. Altenurtively or 

p additionally, the data manipulation servo* conq^rises a server which receives inpv$ ftom a least 

1 20 two different^sources. Pitifis^ the .data roanipulatipn^ seiver compn homology 

f S comparison cngine.ji^ 

J^' There is also provided in accordance with a preferred embbiiLiheiit of fliC; invention^ a 

mctfaod^ of accessing, a data^^iiianip^dation server t^roin '^ multi<-databas<^i#query^'systeixi, 
•B compristngrv 

25 providing the query system with a quefiy which includes a fir^^^directive assigmng^a 

value to at least one -field of an input'^object associated«^withilhc data^mai3ipulation«'$en^v and^a 
second directive which detmnines a value^of at least one field of an ou^ut object associated 
witi} the data manipulation server; and 

invoking the data manipulation server responsive to (he second directive. Preferably, 

30 providing the query comprises preparing the query usirig a graphical interface designed for 
querying structured databases. Alternatively or additionally, tiie data manipulation server 
conqtrises a homology engine. 
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There is also provid^^ accordance with a preferred embodiment of the invention, a 
method of performing a database search using a multi-database query system, compiising: 

providing the query system with a query which includes at least one directive related to 
a database and at least one directiv related to a data manipulation server, wherein the 
directives are stated in an identical structmal ifonnat; 

translating the directives into commands recognized by the database and the data 
manipiilation servei^ and 

submitting the commands respectively to the data mamindatiQn server and to the 
database. Preferably, the data manipulation server comprises a homology coaxqiaiison engine. 
Alternatively or additionally, translating the directives com$nises identilying, by a query 
processor, the directives directed to ^e database and the directives directed to the data 
manipulation server. Prefoably, translating the directives comprises passmg the directives to 
translation servers associated with tiie database or data manipulation server to which the 
directives are directed. Altematively or additionally^ ^ method coIt^ndises determining an 
order for the directives to be processed m and sabmitttng the translated directives to ^e data 
• * manipulation server and to the database according to the detemiincd order. 

U In a preierred embodiment of the invention, the method comprises receiving results 

firom said submission and translating tiie results into structured objects. Preferably, translating 
the results into structured objects comprises translating the results to structured objects related 

>20 to the directives. 

(3 

tn Alternatively or additionally, providing a query comprises jnoviding a queny in an 

Object Protocol Mod^ (OPM)-lilDe language. 
,3 BRIEF DESCRIPTION OF FIGURES 

■0 invention will be more clearly understood by reference to tiie following description 

25 of preferred embodiments thereof in conjunction with the figures, in which: 

Fig, 1 is a schematic iUustiation of a multi-database query system^ in accordance with a 
preferred mibodiment of the ixrvrention; and 

Fig. 2 is a flowchart of the actions p eifoim ed by the multi-database query system of 
Fig. 1, in accordance with a preferred embodiment of the present invention. 
30 I>STAILED DESCRIPTION OF THE PIUSFEIffiED EMBODIMENT 

Fig. 1 is a schematic illustration of a multi-database query system 20, in accordance 
with a preferred embodiment of the invention. System 20 mediates between an end-us^ 22, 
and a plurality of service providers which include databases 24 and one or more data 
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manipulation s^fc, such as a homology search engine J^rmx detection processes are 
another exanople of data manipulation seivers. Engine 26 is a data manipulation server in that it 
provides processing services and is not primarily used for storing and providing infonnation. 
Preferably, engine 26 does not store information and a user requesting processing services must 
5 provide the information to be processed or must provide a Knk to a datab 

flic information. Data manipalatioQ serves may receive a single input of data. e.g., error 
detection processes Tvhich receive a single sequence, or a pluraKly of imnits, c.g.. homology 
engines which compare sequences fiom two dilTerent sources. One of the objects of some 
preferred embodiments of the invention is to allow end-user 22 to relate to homotogy eogine 26 
10 and/or to other data manipulation servers as if they were databases 24. 

Databases 24 m^ be organized differently fiom each other and are not gcneraDy 
controllable by a sopeivisor of system 20. End user 22 provides system 20 with queries in a 
quoiy-language of system 20. preferably a structured quay hmgnage, socfa as OPM. Preferably. 
^ a single query may be directed to more than one service provider. For example, a single quer^^ 
j|5 may be directed to a plurality ofdatabases 24 and to homology engine 26. 
I* Preferably, system 20 coTitpHses a graphical user interfece 28 which receives queries in 

a graphical form and translates feem into the system's query language. Alternatively or 
;P additionally, system 20 comprises a command-line interface 30 whi«ih receives commands fiom 
.p end-user 22 directly in the system's query language «>r possibly usingvnatural.language. Further 
|20 alternatively or additionallj^ system 20 'iSomprises a lemote-unit interfeee 32 which receives 
I queries fiom remote computer units. 

JU System- 20-fui^er comprises a miati-dal5a)a8e«queiy*«proeessor--34 wMeh- ioceives 

;g queries from interfaces-28, 30. and/or,32.and,processes..thein,- as-described hereinbdow. 
iQ Preferably, quay processor-34 and interfaces «8,1>30.-imd/or"32 are^i^^ on 
25 a siitgle compnter36 accessible to end-user 22. HdwevBri-a distributed configmalion can also 
beused. 

Preferably, system 20 further comprises, for each database 24, an OPM translation 
server 38 that mediates between processor 34 and tiie respective service provider. Preferably, 
translation servers 38 translate queries fixan the query language of system 20 hit» qaa^ 
languages supported by the respective database 24. Further preferably, translation servers 38 
translate query results received fiom the databases 24 into the structural object classes of 
system 20. 
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In a similar manner, system 20 preferably comprises an OPM teanslation server 42 
which mediates between processor 34 and homology engine 26. TYanslation server 42 
preferably translates query portions ftom th querjr language of system 20 into commands 
supported by homology engine 26. Tbzt is, the OPM language allows, in accordance with 
preferred embodiments of the invention, phrasing queries that access homology engine 26 as a 
database. TVamdation server 42 tiamdatcs query di,«:tives. such as limitations, into commands 
to be performed by homology engine 26. In addition, translation server 42 preferably translates 
ihe output fiom homology engine 26 mto stniotuial objects, in accordance with fee query 
language used by system 20. An exemplary structufal definition of olgects used to access a 
homology fiom the OPM suite is described in appendix A. The structural definition of 

appends A is written in a language used to define OPM objects, described for example in 
CW. LA.; Kosky. A.S.; Martcowitz. VJM.; Szelo. B.; and Topaloglou, T.. 1998. "Advanced 
Query MecAanisms for Biological Databases" in Proceedings of the 6* International 
Conference on Intelligent systems fbrMolecular biology aSMB'98). the disclosure of wWch is 
incorporated herein by reference. 
J * Ahernatively or additionally, a single translation server 38 may be used for more than 

1^ one service provider. Alternatively or additionally. OPM processor 34 perfenns some or aU of 
j the translation tasks of translation serve« 38 and 42. Preferably. OPM serveca 38 and 42 are 
.P situated on the same computer as their respective service providers 24 and 26. Alternatively 
g> OPM servers 3 8 and 42 are located on computers pioximal to their respective service provider^ 
jj^ 24 and 26. although teanslation servers may be located substantially anywhere. 
rU Preferably, a multi-database directoiy 40 is used by processor 34 to determine to ^cb 

,g service provider 24 and 26. the portions of a query aie directed. Directory 40 preferably 
iQ summaiires the contents, oiganizatian methodologies and aq^bilities of databases 24 and 
25 engines 26. m a preferred embodimenf. a single directory is used fcr a phnality of query 
processors 34. such tbat adding additional service providers to system 20 i^pnies only 
prq>armg a respective OPM server for the additional service provider and updating directoiy 
40, while no changes are needed m processors 34. 

In a preferred embodimenl of the present invention, the various componente of system 
20 mteract using a distributcd^bject technology, such as, the Common Object Request Broker 
AxcWtecture (CORBA) which is described, for example, in fee Web Site of tt^ "Object 
Management CSroup" (OMG) at www.oms.org and was available on Jmie 27 1999 Hie 
drsclosme of this web site is incorporated herein by reference. Prefertbly. a ptorality of 
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diflFerem CORB^^eifaces are used in system 20 for diflfer^^ s of interactions between 
the coinponaits of system 20. In one example, a fiist CORBA interface is used for 
programmine and a second intei&ce is used for object transfer and/or sharii^g. Preferably, 
remote-iimt intecfece 32 also cominises a CX)RBA inte^^ 

Alternatively or additionally, othesr distributed-object technologies, sooh as, Microsoft's 
Component Object Model (COM) or the UNIX environment Remote procedure call (RPC). 
may be used to allow remote and/or non-remote components of system 20 to interact. FurliuJ 
alternatively or additionaUyrsyBtem.20 may*be implemented in its entirety by a single process • 
and/or on a single processors >- 





Tablel 




(1) 


SELECT 


1 = T.fiagU, a =h.accessor 


(2) 


FROM 


r in localrFragments 


(3) 




be in blast:Blast_CaU 


(4) 




bo in bcoutput 


(5) 




h = bo.sumuiary.sequence 


(6) 


WHERE 


r.finished = "today" and 


(7) 




bc.querySeq = resequence and 


(8) 




be. command = "blastn" and 


(9) 




bC'xlataSoutee»- "dbES^ibid- 


(10) 




h.Iengai'«> 300* 



!U 
.p 
'20 

m 
m 

,g Table imustrateis%.san^lequeiy received by quay processor 34 from any of interfaces 

28, 30 and.32. Th-Svquery^in^table-l is wiittcn.accQnJing*tp the OPM queiy language described, 
for exanq>le. in die •ISMB'98-'publication lefisrenced-hcreinabove. This OPM query language 
allows accessing a pluraKfy-of-databa5cs^24 fi^ a singleiqueiy. The quay.of table 1 relates to 
bofli a database 24-and an homology engine 26. the homology engine being accessed as if it 
were a database. 



The query in table 1 is built of three sections. A first section labeled SELBCT states the 
fields which are to appear in the output generated responsive to the query. In table 1 these 
fields are a "fiagld" field of a variable r. and an "accessor" field of a variable h (the variables r 
and h are defined in the second section). A second section, labeled WHERE, defines the 
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vanables mentioned in die query by stedng the database object classes to ^ch they relate 
THat IS. the second section states ^ch objects are candidates for fulfilling the queiy. 

in table 1 . the variable r. for example, corresponds to a Tn^ments" object class in a 
database named -local", m the same way. a dummy variable -be" corresponds to an object class 
named -Blast.Call" in a pseudo database "blast". However, unlike variable r which represents 
an actual fidd of data in a database 24. variable "be" does not represent any such field, and a 
database "blast- does not actually eodst 

Rather, when the "blast" dat^ is referred to m a query, processor 34 refers to 
homology engine 26. Translation server 42 preferably performs a^y reqmrcd banslations to the 
o^ut and output of homology engine 26. such that the homology engine ^ears to processor 
34 as a database. Preferably, the entire interlace with homology engine 26 is structmed in a 
smgle translation object, for example, in accordance with the "Blast.CaD" object chas in table 
1. which is defined in appendix A. The translation object includes the input to and output fiom 
homology engme 26. far example, the "Blast^Call" object class has fields ^ch relate to the 
commands to engine 26. such as. a "command" fidd which states the type of command 
^ perfomied by engine 26, a "querySeq" field whidi states an ii^nt sequence to be compared by 

a "dataSonrce" field Which states a database of sequences to ^ file i,,^ 
ly sequence ,s con^ared. In addition, the "Blast^Call" object class has an "ou^ut" field into 
.P which the output fi«m homology engine 26 is preferably structumHy stored, ti the query of 
.20 table 1. a dummy variable, "bo", refem to the sub^bject "output", thus smiplie^g the query 
in statcmenls. i j 

When a quer,r rtfatcs to an action, such as a search or a filter to be performed in a 
- pseudo database, processor 34 first has the respe^ive engine 26 perform any required 
commands to fin up the output fields of the object r^enting the pseudo database, e g 
"Blast.CaU". and only then the search is performed. Alternatively or additionally, as the oumut' 
records become available fiom homology enghie 26 they are seat for further processmg. m 
some cases, the records can be processed ,,ven before all the fields axe avaih*le fiom engme 
26. One example of a query optimization as applied to data manipulation servei^ is drat the 
query translator mstructs the engine to pre^^ie only those result fields which are achiaUy 
reqmred for finflier processing or diq,!^. Another example of optimization is aUowing some 

of the fields to be provided at a later time than other fidds. Modiljdng the onJc^ of gcnen^ 
of fields, even between records, may be nsefld if the some fields are required for fiirdicr data 

mampulation or for a querying agamst a slow database and are thus time criticaL For some 
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types Of data nv^^an. it may cv<m be Bsefid to start the jftlation with only part of the 
fields and then «peat the manipulation with the test of the fields. One example where it is 
usefid to start manipnlating before aa the fields ace available is where the ^ 
camedom,atleasttosome«cten£.wlthontthe£eldarwhe«fhevalu of fiu. field or the rang 
of possible values of the field can be known. Thus, for exan«,le, a DNA homology can b 
faded based on both of die strands not matching, even before it is known whid, strand needs to 
be matched. Once the strand information is available, the gro,^ of accepted matches can be 
fimher linuted'^asingnlihat.infoixnation. 

Thus, system 20 can have different-parts-of^a query evaluated in parallel, in particuhu- 
tm,e consmning^parts petfomed by data manipulation servers. For example, homology engine 
26 may begin to operate as records fi«m another part of a query become available, and/or the 
ouiput fiom engine 26 may be processed as it is provided, without waiting fi>r afi flw results 

Thispa«UeBsmisposaiT,lebecausehomologyengine26isaccessedfiomwiflrintfiequa^ An 
advantage of some embodiments of the invention is the savmgs M response time and in 

55 conmiunication and CPU resources ofcomplex queries due to this paranelism. 

hi some cases, such paraUel processing of data manipulation may inquire the data 
^ "««'iP»»«ti<« server or the data manq,ulatiou program in^^^ 
;g mfomiation into account Ih one example, a blast server may associate tbe actual partial 

.£ mfomiation used with a resuft record set. so that it can forthcr Ihnif the.6ear«h.ieBultB.afler.the 
20 feet. 



in 
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A ttiird ,section«of.the.queryi~labeled WRERE;-stat^ ««, conditions to be. fulfilled -by 
|a those objects^selected. by the queiy.^ht table 1 these conditions,mclude that^-a field named 
"finished-^ of the vaidable^r must-have a value-^tdday-ra field "queiarSeq^of tbe^variable be 
• must have a vaIue.equal«o,th**vah^».of thfe^fieid^-sequence^^^ etc..Jn.this section, 

the conditians*on.daabase objects .and on pseudo. database objects are prefenrfrfy^stated 
robstantially in Ibe same wayi 

Fig.2is.aflowchartoffheactionspc«Amnedhiproce8siagaquerybysystem20 m 
acconlance with a pr«ferr«i embodiment of .he present invention. Upon receiving a queo^. sU 
as the query in table 1. processor 34 divides (60) die query into parts v^hich a« performed by 
die various service providers 24 and 26. Processor 34 preferably determines, for example using 
mefliods known m the art. to which service provider each line m the query is directed. 
Preferably, the detennmation is pertmned by reference to directory 40. In the query of table 1 
processor 34 determines fiom die seccmd line that variable r is to be searched m the 
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nacned "local". Prom the third line it is determined that variable bo is to be "scaxched" m engine 
26 named "blast". Therefore, lines 2 and 6 of the query aie dimjted to Ihe database 

hnes 3. 7. 8 and 9 are directed to homology engine 26. Lines 1. 4. 5 and 10 do not rete to aqy 
database and therefore they are preferably processed by processor 34. 

Processor 34 then preferably determines (62) the cross-dependence of the parts of the 
query. i.e.. which parts require data fiom other parts and therefor* must receive the data from 
the other pam before ihey are performed In table 1. it is detarmined W Ih^ 
qnoy part directed to homology er^ 26 requires ou^nit fiom another query part 

Thereafter, processor 34 sends (64) to OPM translation sen^ 38 and/or 42 a first 
round of qu«ry parts belonghig to then- respective service providers 24 and 26. The query parts 
sent in the first round are those ^vhich do not require results fiom other queries. In table 1. the 
part relatii^ to variable r. i.e.. lines 2 and 6. are seat to the OPM server 38 of database -loLl" 
These Imes designate a query for all the Fragment objects in the database which have a vahie 
"today- m their "finished" field. The OPM server translates (66) the received query part mto a 
^ language recognized by database "local". The translated query part is passed to the database 24 
p '^^<=*^I»^e«^(<»)<bc query and returns (70) the results of tfiequ^ 

- server 38. The OPM server 38 translates (72) the results received fiom the database 24 into the 
OPM result fonnat and passes the translated results to processor 34. 

If (74) the query hicludes additional query parts which were not performed yet, e.g.| 
^ query parts dependent on results fiom other queries, steps 64. 66, 68. 70 and 72 are r^eated for 
in the additional query parts, b, the example of table 1. the qucay part formed of lines 3, 7, 8 and 9 
IS passed to the translation server 42 of homology engine 26. THe translation server 42 
translates (66) the query part mio commands performed by homology engme 26. For each 
sequence of variable r in the output of database "local", translation server 42 sends a "bfastn" 
command to engine 26 to p«fonn a homology comparison between the sequence and ^ 
database "dbEST". THe results received fit.m engine 26 are preferably suimnarized (72) by 
translation server 42 in the "output" field of the "BIast_CaU- object. 

to some prefened embodiments of the present mvention. system 20 begms a second 

lound of processmg query parts before a first round on which the second round dep^ds is 
finished. Rafiier. as the first romui provides records as results, the sec<md round can manipu^ 
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Once all the query parts were handled by their respective service providers 24 and 26 
processor 34 performs (76) any remaining operations in the queries and provides (78) the use^ 
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^th the results^ired in the SELECT sectic^i of the query, m the example of table 1 
processor 34 perfonns the comparison in line 10 of the query. Variable h refers to the field 
"sequence" of flie sub^bject "summaiy" of the object "output", xvhich represents the results 
fioxn the blast comparison. Sequences having a length greater than 300 ai^ selected fiom the 
blast results. The user is &eii provided wttt, the value of the -accessor" field of the variable h 
and ^ the value of flie "ftagld" field of fte variable r. fcr all the objects ^ch fixifill the 
quay. 

The above description.has.focused on BtAST as a homolofiMjMethod. however, other 
types of homology servers may also be used, for example BLASTJC. BLASIN^^ BLASIP. 
AddMonaUy. other types of data manipulation may be providedf for example, enor correction! 
m which a sequence is conected for various types of errors. Another type of data manipulation 
server is for example a server vAich guesses a ternary structure of apiotein fiom its sequence, 
fiar example the nuniber of alpha helixes or the protdn's afSnity to a certain DNA sequence. 
Ahematrvely to guessmg the structure, the server may provide a gn«ftD« fecility which grades a 
list of provided sequences for afiSnity to the protein (or for sfanilarity of their derived protein) 
or which selects those sequences wtedi have a certain afihuty. 

As can be j«>prcciated, some of these data manipulation servers require only one input 
record set while others, require more ^ one input record set. For example, a homology 
.p search can compare a first set of records ag^t,r!ecoTds -in a,seeond>database.(fixed> value) or 
against asecond^set of piovided'iecords.^]h some^cases, fhioe or imwe ii^nt8.may*e provided, 
for «xample,where'.a third-record set mchides a list of rulesjBrhidbt.appJy^whca^ the 
two record sets. some cases, all 'die .ipcsird sets*need^to be.fidly.,spe0ifi<Si»before the 
manipulatibn can bevperfoimed. In -other casesronlyone-brpo^ 
sets needs..to ,be*fimy^spgcifiea *beforis . starting 

optimizing and performing^in paraUel can be^applied-to the availability-of record.sets asweU.. 
In a prefcDsd embedimcnfeof die myention,-tfie,dcfinition8 of hbw the datamanipulatiou server 
«>petal«8 in the absence of data and/or the retotive computation time for different tasks thereby 
are stored in directory 40. preferably along widi other information useful for optimiang queries 
include data maarpuIatioTL 

An advantage of some of the above preferred embodfanents is that it is possible to use 
substantially any tool developed for manipulation of databases to access data manipuhrtion 
servers. For example, graphic mtcrfecc 28 may be an interface developed solely for preparing 
queries for database servers, as described, for example, in Kosky. A.S., Chen. LA., Markowitz. 
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VJV[., and Szeto, E. "Exploring H teogeoeoBs Biological Databases: Tools and AppKcaiions" 
Proceedings of the 6tt International Confe««ca on Extending Database TechnologJ 
(EDBr98). Lecture Notes in Computer Science. VoL 1377. SpringerwVerlag. 1998, pp. 499- 
513. the disclosure of which is incorporated herein by i^ence. A user may use ftis inteifece 
to prepare sophisticated queries which include access to data manipulation servers, such as 
IwMnology seardt engines. 

likewise, optimization tools designed for database ijueries may be appKed. in 
accordance with the above ptcfenrd embodiments, to queries wtodi include reference to daia 
manipulatton servers. Such optimization is especially important for queries which raferoice 
data manipulation servers because usually these servers tequite substanliany more processing 
time than databases. 

Furthsimore, Uie results of ihe queries are preferably provided in a single common 
fimnat wMch aUows use of a single standard ou^ interfece to display the results. 

m addition, variables representing database and pseudo dalabase objects may be linked 
^ together using methods to linldng databases described, for exanq)le, in the EDBPPS 
publication referenced hewdnabove. lUese Imking metbods allow shnpler statement of queries 

»^'««"™«»«*ranq,arencytofl»userwhodoesnotneedtolmowaie8^^ 

If! servers used. 

,P AhhDu^ the above described embodiments refer to queries which relate to data 

manipulatiou servers as to databases, some embodiments of the invention relate to queries 
whidi inchide commands to be performed by dato manipulation servers, not necessarily in the 
^ same mamier in which databases are searched. For example, a query may include an explicit 
I command to be carried out by a data manipulation server. e.g.. homology engine 26. Such 
•e c«nniands are referred to herein as appbcationaqp^ 

25 Table 2 shows a query similar to the query of table 1 in which homology engine 26 is 

activated usmg expKcit commands written in a format acceptoble by OPM processor 34. Line 6 
in ^le 2 is a command to perform "blast" on the "sequence" fields of the possible vahies of 
variable r. The bhist is performed against a database "dbEST". The results from perfonning the 
blast command ^ear in a variable b whidh is defined m Ime 3 of table 2 

30 Tabled 

(1) SELECT l=r.fiagrd.a=hjwoessor 

(2) FROM r in localrFTagments 

(3) b m bIast:Output 
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(4) b bo.suainiaiy.seq[ucaice 

(5) WHERE ^finished- "today" and 
r.sequOToe.blas^dbESTO «id 

C7) b.queiy " r.sequence and 

(S) h.Ieqgtfa>300 

In a preferred embodiment of the preseat inveatioii, ^vhen. processor 34 encountcn aa 
ASDT coiiimand,.,such. as the "blasts conmranA on line 6. it first- checks withi.lfae.database 
involved, i.e., the.«ldfcal'^ database.^ whether the database snpports the command^in the specific 
synlax. Then, processor 34 consnlts diieetwy 40 to determino a server which has the routine 
invoked by flie connnand. Processor 34 passes die ASDT command, witfa ^ever data objects 
to which the command relates, directly to the detennmcd server. Ahemativeay. the command is 
passed through translation server 42. Hie onQxtt fiom the server is prefcrabty passed to 
^ processor 34 in a structured foim. as described above, so as to aUow easy manipulation of the 
1^5 results. In this embodiment, processor 34 does notmodel homology engine 26 as a database 24, 

J does access the homology engme fiom within a complex query which accesses databases. ' 

\A The ASDT commands do not necessarily appear in the WHERE section of the query. 

;g Table 3 shows a query in whit* a command appear in the SELECT section of the query. TTie 

.e command is processed after ifae quay js evaluated. « a stago^«?f prescntine„the.results of the 

20 query. 

(n Tab7ef3v -. 



(1) SELECT;if x.gelld^ 



*B (3) FROmtr xinG^'w 

25 (4> WHEI^2S x.£elId«"geLiP001>il» 

In table 3, an "image- field of the variables x which satisfy the query are passed to a 
routine "crop", which retmns a piece of an image having specified coordinates. Tie resnlts 
fiom the routine "crop" are passed to a routine "display" ^ch displays the nsoH in any 
30 desired manner. 

The routines referenced by the ASDT commands may be evaluated by a data 
manipulation server as described above with reference to the blast command evaluated by 
homology engine 26. Alternatively or additionally, some mutines may be situated within 
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processor 34 or in directory 40. TTbe statemoxt of the cotninands within a query ra&cr than 
invoking the commands on fee results received ftom a query, is simpler to the user. Jn addition, 
invoking the commands fiom within the query aUows performing the command before the' 
resuJls are passed to end-user 22. In many cases tiiis conserves substantial conmimiication 
resources. 

In some cases users accessing databases are fi^queotly interested in attributes which 
may be extracted fiom the hnage of a complex date field, for example, a geL Such attributes 
include, for example, the leugth of an image of the gel. its average intensity or specific laues of 
the image. Therefore, some databases have redundant data fields which have values for these 
attributes. By usmg ASDT commands these redundant fields are not needed. The louttnes 
mvokcd by the ASDT commands may be stored in the database 24, on a separate data 
manipulation server, in directory 40 and/or in processor 34. 

It is noted that the ASDT commaiids may be invoked implicitly as described above 
wiQi reference to Fig. 2. For each command, a command data object is preferably defined 
1 55 wMeai includes tnpat and output fields of the command. An access to an ontput field of the 
'p object is translated by system 20 as an impKcit invocation of fee command. 

It will be jqipreciated that the above descn^ed mefliods may be varied m man^ 
inchiding, changiijg the order of stq>s. and the exact implementation used, a stf^ 
£^ appreciated that ihe above described description of methods and ^aratus are to be mterpreled 
as mduding apparatus for canymg out the methods and methods of using the apparatus, 
jn Especially, the above methods should be interpreted to describe software for canying ont a 
conapletc method as described abovc^ a part thereof or software which modifies an existing 
software to perform as described above. In addition, the scope of die invention includes such 

software stored in a coinputar readable media, such as a didc, stoiwi in a memoty or execute 
25 on a computer. 

It is noted that the above described embodiments are given by way of example only, 
and the scope of the invention is Imrited only by the claims. When used in the following 
dahns, the terms "comprise", "include", "have" and their coiqugates mean "inchiding but not 
Itim'ted to". 
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CLAIMS 



1- A muJti-database query system which queries a pluraHty of databases and servers, 
comprisiiig: 

5 minputwUdiiBceivesqu^esinastractured 

a translation server ^ch translates at least a part of a received query into commands 
recognized by a data manipuIation serveT. 

2. A system according to claim 1, comprising a processor which parses'the« teceived qiiary 
1 0 into parts according to the databases and servers to wdnoh they relate. 

3. A system according to claun 1, wfaerem the structured fbim comprises a form used to 
query datab^es. 

(n 

[g5 4. A system according to claim 1, wherem the input receives a query which relates to at 
least one database and at least oxxe data manipulation server. 

fU ^ ^ system according to claim 1, wherein tiie translation server models results j&om the 
* p data manipulation server mtp database qbjects 

20 

13 

IP o, A system according to claim:-l, whereiri the data manqpulation.s6r^c!r comprises a server 

I U which receives input from a least two di£RS%gattsourpes. 

iO 7. A system, accordingi^ip .claim 6, w^ a 

25 homology comparison engine. - 

8. A method of accessing a data man^ation servca- fiom a multi-database query system, 
comprising: 

providing the query system wifli a query which includes a first directive og^ig ni^^g a 
30 value to at least one field of an input object associated with the data manipulation server and a 
second directive which determines a value of at least one field of an ou^ut object associated 
with the data manipulation server; and 

invoking the data mampulation server responsive to the second directive. 
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9. A method accordiTig to claim 8, herein providix^g the query comprises preparing the 
query using a gr^hical iixter&ce designed for querying structured databases. 

10. A system according to claim 8. wherein tfie data manipulation server comprises a 
homology engine. 

11. A method of perfonning a database search using a multi-database query system, 
coxuprlsing: 

providing the queiy system with a query which includes at least one dizecdve related to 
a database and at least one directive related to a data manipulation server, wherein the 
directives are stated in an identical structural format; 

translating the directives into commands recognized by flie database and the ^atp 
manipulation server; and 

in 

submitting the commands respectively to the data manipulation server and to the 
f"^ database. 

«P 12. A method according to claim 11, whersin the data manipulation server comprises a 
' p homology comparison engine. 

ao 

ifl A method according to claim 11, wherein translating the directives comprises 

fU idcntiftfing, by a query processor^ the directives directed to the database and tiie directives 
directed to die data manipulation server. 

IS 

25 14. A method according to clami 13, wherein filiating the dhectives con^irises passing 
the directives to translation servers associated with die database or data manipulation server to 
which the directives are directed. 

15- A mediod according to claim 13, comprising det^mining an order for the directives to 
30 be pTx>cessed in and submitdng the translated directives to die data manipulation sen^ and to 
the database according to the determined order. 
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16. A Tnetfaod^ording to claim 1 1 , comprisiiig receiving results fiom said submission and 
translating the results into structured objects. 

17. A metiiod according to daim 16, wherein translating the results into structured objects 
courses translating die results to structured objects related to the directives. 

18. A method according to claim 11, wherein providing a queiy conq>rises providing a 
query in an Objec^Protocol Model (OPM>-Iike-lar^guag€>.^ 
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ME 

SCHEMA blast srv 



^ DESCRIPTION; "The OPM schcana for a queiyable blast servcr«' 

COm^OLLED VALUE CLASS BlastEngine^Cv 
{ "wu_blast 2.0", "ncbijblast 2.0" } 
DEFAULT: "wublast 2.0" 

10 CONTROLLED VALUE CLASS BlasffrogramjCv 
{"blastn", "blastx". 'Ijlastp", "iblastn". "tt>lastx"} 
DEFAULT: "blastn" 

CONTROLLED VALUE CLASS Strand^Cv 
15 {"top", Ijottom". ••both"} 
DEFAULT: "both" 

CONTROLLED VALUE CLASS SortBy^Cv 
{"pvalue", ••count", "highscore", "totalscoie"} 
20 DEFAULT: "pvalue" 

Ig CONTROLLED VALUE CLASS GenCode^Cv 
{ ("Standard op Universal", 1), 
^Vertebrate Mitochondrial". 2), 
("Yeast Mitochondrial", 3), 
("Mold, Piotozan, ",4), 
f y Clnvertebratc Mitochondrial", 5), 
,£ ("Ciliate Macfonuclear", 6), 
j ("£ncu]oderaiateMitochondrial",9), 
3^ ("Alternative Ciliate Macxtmuclear", 10) 
("EubactrialMl). 
("Altcanative Yeast", 12). 
("Ascidian Mitochondrial". 13), 
("Flatwoxm Mitochondrial". 14) 
} 

DEFAULT: "Standard or Univeisal" 
CODBJTYPE: SMALLINT 



40 CONTROLLED VALUE CLASS Filter CV 

{ ("none". 0). 

C'seg". 1), 

("xntt",2), 

("s^+xna", 3), 
45 ("dust", 4) 

> 

DEFAULT: "none" 
CODEJTYPE: SMALUNTT 

50 CONTROLLED VALUE CLASS Matrix.Cv 

18 



086/00672 



{ Cblosum62", 0), 
("blosuin35", 1), 
C'blosum40". 2). 
C'blosimi45". 3). 
5 C'blosumSO". 4), 
("blosuraeS", 5), 
("blosumTO", 6), 
Chlosam75'\ 7), 

f 131081111180", 8), 

10 C'blosuni85'', 9), 
(•T3losuma5;:v lOX 

("blosumlOOarl 1)» 

rGONNET",n2),.. 

C'paml0**ri3), 
15 C'pam20'', 14), 

CpamSO", 15), 

rpam40". 16). 

rpani50", 17). 

C^ain60", 18), 
20 ("paTn70", 19), 
(H ("pam80",20), 
h Cpam90'', 21), 
IZ ("pamlOO", 22), 
,p rpaQill0",23), 
ife5 ("paml20", 24), 
;p rpamlSO", 25), 
ry rpainl40",26), 
r rpaml50".27). 
, Cpaml60'Jf28)?e 
30 ("parol 70\ 29),^ 
j^l ("pamlSO"; 30)7 
Jy ("paml90",31), 
:S ("pam200% 32), 

("pam210", 33), 
S ("pam220", 34)J* 

("pam730", 35);" 

Cpam240". 36),-^ 

Cpani250*'. 37). 

40 ("pam270", 39), 
("pam280", 40), 
("pam290". 41), 
<"pam300", 42). 
("pain310", 43). 

45 C'pain320". 44). 
("pam330", 45), 
C'p«n340-, 46). 
("pamSSO", 47), 
("pam360", 48), 

50 C'pain370", 49). 
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C^amSSO", 50), 
("pam390", 51), 
C*pain400", 52), 
rpam410", 53). 
5 ("pam420^ 54). 
("pam430", 55). 
C'pam440". 56), 
("pani450", 57) 
} 

10 DEFAULT: "blo5um62- 

CODE^TYPE: SMAIXINT 

CONTROLLED VALUE CLASS DB„Cv 
{ "testdb". "localdb", "dbest" } 
1 5 DEFAULT: "tcstdb" 

OBJECT CLASS Blast^CaU 

DESCRIPTION: "A blast call object represents a particular homology search 
using a blast eagiae" 
20 ID: callld 

ATTRIBUTE callld : INTEGER REQUIRED 
13 ATTR IBUTE engme : BlastEiigiiie_Cv REQUIRED 
i ^ ATTRIBUTE program : BlastProgram_Cv REQUIRED 
g ATTR IBUTE query : VARCHAR(2000) REQUIRED 
^5 ATTRIBUTE datasource: DB_Cv REQUIRED 
\ ^ ATTRIBUTE output: set-of [1 ,] BlastjOutput REQUIRED 
jy ATTRIBUTE matrix: Matrix_Cv OPTIONAL 
ATTRIBUTE strand: Strand^Cv OPTIONAL 
ATTRIBUTE sortby: SortBy_Cv OPTIONAL 
^ ATTRIBUTE dbgcode: GeaCodc^Cv OPTIONAL 
ATTRIBUTE filter. FUter.Cv OPTIONAL 
ATTRIBUTE flureshold: REAL OPTIONAL 
ATTRIBUTE alignments: INTEGER OPTIONAL 
ATTRIBUTE scores: INTEGER OPTIONAL 
ATTRIBUTE param_E: REAL OPTIONAL 
ATTRIBUTE param_S: REAL OPTIONAL 
ATTRIBUTE paramJB2: REAL OPTIONAL 
ATTRIBUTE param_S2: REAL OPTIONAL 
ATTRIBUTE param_W: INTEGER OPTIONAL 
40 ATTRIBUTE parara_T: INTEGER OPTIONAL 
ATTRIBUTE param_X: INTOGER OPTIONAL 
ATTRIBUTE param_N: INTEGER OPTIONAL 
ATTRIBUTE paramlM: INTEGER OPTIONAL 
ATTRIBUTE paramJB: INTEGER OPTIONAL 
45 ATTRIBUTE param.V: INTEGER OPTIONAL 

OBJECT CLASS Blast_Output 

DESCRTPTTON: "The output of a specific blast call" 
XDrrunld 

50 ATTRIBUTE runid: INTEGER REQUIRED 
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ATTRIBUTE program: VARCHAR(8) REQUIRED 
ATTR IBUTE version: VARCHAR(20) REQUIRED 
ATnOBUTE revision: VARCHAR(20) REQUIRED 
ATTRIBUTE build: VARCHAR(40) REQUIRED 
5 ATTRIBUTE queryld : VARCHARaO) REQUIRED 

ATTRIBUTE querySeq : VARCHAR(2000) REQUIRED 
ATTRIBUTE queryLength: INTEGER REQUIRED 
ATTRIBUTE database : DB_Cv REQUIRED 
ATTRIBUTE bits: set-of [1 ,] BlastHits REQUIRED 

10 ATTR IBUTE dbSize^Seqc^ : IN13BOER REQIJIRED... 
ATTRIB13TE dbSize_Letters-r INTEGER REQJJIEUaD^ 
ATTRIBUTE dbFile : VARCHAR(80) REQUIREDi 
ATTRffiUTE dbReleascd^ VARGHAR(40)«EQinRED 
ATTRIBUTE dbPosted : VAR@HAR(40) REQUIRED 

15 ATTRIBXJIEhitSaffi^*: INTEGER REQUIRED 

ATTRIBUTE scarchTiine : VARCHAR(40) REQUIRED 
ATTRIBUTE totalTime : VARCHAR(40) REQUIRED 
ATTRIBUTE nmDate : VARCHAR(40) REQUIRED 
ATTRIBUTE parameters: set-of [1 J Ou^ntParametos REQUIRED 

20 

p OBJECT CLASS OatputParameteis 
1^ ID: paranild 

Z ATTRIBUTE paramld: INTEGER REQUIRED 
;25 ATTRIBUTE strand: VARCHAR(1 0) REQUIRED 
ATTRIBUTE fiame: VARCHAR(IO) REQUIRED 
ATTRIBUTE matrixld: VARCHAR(1 0) REQUIRED 
ATTRIBUTE luatrbcNamez^yARS^i^^ 
ATTRIBUTE lamdb%Used^ VARCRAR01O)iREQUIREDfe 
J^O ATTRIBUTE K^Uscd>VARC3JAR(l 0) REQUIRED 
IP ATTRIBUTE H::iysed: VARCHAR(10) REQUIRED^ 
1 ATTRIBUTE larndb^^Cdroputed: VARCHAR<1 0) REQIpCE^ 
ATTRIBUTE K:::Comp\itcdv VAReHAR(10) REi^UIRED^ 
ATTRIBUTE H_CcOTputed: VARCHAR(IO);RE@;iJIRiE0)9^^ 
[5 ATTRIBUTE paramVEl: VARGE£AR(10) REQUIBUEDX 
ATTRIBUTE param_S 1 : VARCHAR(»liPyREQXIIRED^" 
ATTiaBUTEparamIwirVARCHAR<;i0) REQUIRE 
ATTRiBUTE-param:^.Tl: VAiRCgHAR^lO) ^REQUIRED 
ATTRiBT£JTE;.paramJxi^^ VAR(SH!A^ 
40 ATTR IBUTE param_E2: VARCHAR(10) REQUIRED 
ATTRIBUTE param_S2: VARCHAR(10) REQUIRED 

OBJECT CLASS BlasfHeader 

DESCRIPTION: "The header section of BLAST ou^niT 

45 ID: headcrld 

ATTRIBUTE beaderld: INTEGER REQUIRED 
ATTRIBUTE program: VARCHAR(8) REQUIRED 
ATTRIBUTE version: VARCHARj(20) REQUIRED 
ATTRIBUTE revision: VARCHAR(20) REQUIRED 

50 ATTRIBUTE build: VARCHAR<40) REQUIRED 
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ATTRIBUre queryld : VARCJtlAR(2a) REQUIRED 
ATTRIBUTE querySeq : VARCHAR(2000) REQUIRED 
ATTRIBUTE database : DB_Cv BUEQUIRED 
ATTRIBUTE numOfSeqxusno^ : INfTEGER REQUIRED 
5 AITRJLBUTE nmnOfLetters : INTEGER REQUIRED 

OBJECT CXASS BiBstmts 
DESCRIFnON: "Blast Hits- 
ID: accession 

10 ATTRIBUTE accession : VARCHAR(12) REQUIRED 

ATTRIBUTE description : VARCHAR(255) REQUIRED 

ATTRIBUTE score : INTEGER REQUIRED 

ATTRIBUTE pvalue : REAL REQUIRED 

ATTRIBUTE mm : INTEQER REQUIRED 
1 5 ATTRIBUTE length : INTEGER OPTIONAL 

ATTRIBUTE hsp : set-of [IJ BlasfflSP OPTIONAL 

. OBJECT CLASS BlastHSP 
ID:hspId 

20 ATTRIBUTE hspid : INTEGER REQUIRED 
ATTRIBUTE score : INTEGER REQUIRED 
ATTRIBUTE expect: REAL REQUIRED 
12 ATTRIBUTE pvalue; REAL REQUIRED 
e ATTRIBUTE strandl : VARCHAR(1) REQUIRED 
gS ATTRIBmEstrand2:VARCHAR(l) REQUIRED 
^ ATTRIBUTE idendties : REAL RBQUIRED 
^ ATTRIBUTE positives : REAL REQUIRED 
ATTRIBUTE query (sequence^ begin, end) : 

(VARCHAR(500) REQUIRED. INTEGER REQUIRED, INTEGER REQUIRED) 
ATTRIBUTE target (sequence, begin, end) : 

CVARCHAR(500) REQUIRED, INTEGER REQUIRED, INTEGER REQUIRED) 
ATTRIBUTE aUgn : VARCHAR(500) REQUIRED 
ATTRIBUTE t5 J><^ : INTEGER REQUIRED 
ATTRIBUTE tsljcnd : INTEGER REQUTOED 
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